Goto

Collaborating Authors

 red teaming gpt-4


Red Teaming GPT-4 Was Valuable. Violet Teaming Will Make It Better

WIRED

Last year, I was asked to break GPT-4--to get it to output terrible things. I and other interdisciplinary researchers were given advance access and attempted to prompt GPT-4 to show biases, generate hateful propaganda, and even take deceptive actions in order to help OpenAI understand the risks it posed, so they could be addressed before its public release. This is called AI red teaming: attempting to get an AI system to act in harmful or unintended ways. Aviv Ovadya consults for funders and companies on AI governance and is an affiliate with Harvard's Berkman Klein Center and GovAI. Red teaming is a valuable step toward building AI models that won't harm society.